Accepted for publication in Microprocessors and Microsystems.

ثبت نشده

چکیده

Introduction Recent advances in VLSI technology have made it possible to use large enough caches to eliminate most of the \conventional" cache misses resulting from limited cache size and/or set associativity. As a result, other \non-conventional" cache misses that used to be obscured by the more frequent conventional cache misses are becoming increasingly dominant in the makeup of the total cache misses. In order to reduce these non-conventional cache misses, there have been attempts to characterize realistic workloads and identify the sources of these cache misses. An example of such attempts is the so-called \cache aanity scheduling" 1?5 that aims at reducing cache misses due to context switching. Another example of cache optimization for non-conventional misses is in the case of dynamic heap allocation. Functional programming languages such as LISP make extensive use of the dynamic heap. Peng and Sohi reported a signiicant performance improvement by characterizing the behavior of heap references and optimizing their cache performance 6. This paper focuses on another source of non-conventional cache misses that occur during page clearing in a virtual memory system. In a virtual memory system, when a physical page is remapped to a new virtual page, the previous contents of the physical page must be cleared by overwriting the whole page with zeros for security reasons. This results in back-to-back write accesses to all the blocks in the page. These back-to-back write accesses not only cause cache misses for blocks not in the cache, but also pollute the cache with 1 blocks that are not immediately needed. The results given in published literature 7?9 show that the performance degradation resulting from block memory operations, of which the page clearing operation is an instance, is quite signiicant. In this paper, we propose a lazy (on-demand) page clearing scheme that delays actual clearing until the blocks of the cleared page are actually accessed. When the block is actually accessed, it is cleared in-cache by using a hardware zero register, thus eliminating costly main memory access. The rest of this paper is organized as follows. The next section presents a brief review of related works. We next describes in detail the proposed on-demand, in-cache clearing scheme and analyzes qualitatively the cache overheads related to page clearing. In the following section, we describe the simulator and the traces used to assess the performance improvement ooered by the proposed scheme. The following section presents the results from …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and implementation of an FPGA-based multiple-colour LED display board

متن کامل

Use of stack simplifies M68HC11 programming

متن کامل

Cross-architecture prediction based scheduling for energy efficient execution on single-ISA heterogeneous chip-multiprocessors

متن کامل

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Accepted for publication in Microprocessors and Microsystems.

ثبت نشده

چکیده

منابع مشابه

Design and implementation of an FPGA-based multiple-colour LED display board

Use of stack simplifies M68HC11 programming

Cross-architecture prediction based scheduling for energy efficient execution on single-ISA heterogeneous chip-multiprocessors

Improved RMS for the PC environment

Measuring the power efficiency of subthreshold FPGAs for implementing portable biomedical applications

Memory resources aware run-time automated scheduling policy for multi-core systems

عنوان ژورنال:

اشتراک گذاری